Search CORE

1,244 research outputs found

Beyond the Zipf-Mandelbrot law in quantitative linguistics

Author: Cohen
Denisov
Li
Mandelbrot
Mandelbrot
Marcelo A. Montemurro
Pietronero
Simon
Tsallis
Tsallis
Zipf
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/01/2001
Field of study

In this paper the Zipf-Mandelbrot law is revisited in the context of linguistics. Despite its widespread popularity the Zipf--Mandelbrot law can only describe the statistical behaviour of a rather restricted fraction of the total number of words contained in some given corpus. In particular, we focus our attention on the important deviations that become statistically relevant as larger corpora are considered and that ultimately could be understood as salient features of the underlying complex process of language generation. Finally, it is shown that all the different observed regimes can be accurately encompassed within a single mathematical framework recently introduced by C. Tsallis.Comment: 6 pages and 7 figures; minor changes in text, added referece

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The University of Manchester - Institutional Repository

Federal Supremacy in the Regulation of Nuclear Energy: Where Do Punitive Damages Lie?

Author: Zipf Edwin A.
Publication venue: eRepository @ Seton Hall
Publication date: 27/09/2022
Field of study

bepress Legal Repository

Seton Hall University Libraries

Testing the robustness of laws of polysemy and brevity versus frequency

Author: A Corral
A Kilgarriff
B MacWhinney
C Fellbaum
EG Altmann
F Font-Clos
G Fenk-Oczlon
GK Zipf
GK Zipf
GK Zipf
J Baixeries
J Ke
M Razavi
N Ide
R Ferrer-i-Cancho
R Newson
RH Baayen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts

Author: Finch E.
Guha R.
Hanbury A.
Holland B. R.
Liu C.
Pirolli P.
Zipf G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering

Crossref

Enlighten

Zipf's Law in Gene Expression

Author: G. K. Zipf
I. Xenarios
P. Bak
V. A. Kuznetsov
V. E. Velculescu
V. E. Velculescu
Publication venue: 'American Physical Society (APS)'
Publication date: 30/09/2002
Field of study

Using data from gene expression databases on various organisms and tissues, including yeast, nematodes, human normal and cancer tissues, and embryonic stem cells, we found that the abundances of expressed genes exhibit a power-law distribution with an exponent close to -1, i.e., they obey Zipf's law. Furthermore, by simulations of a simple model with an intra-cellular reaction network, we found that Zipf's law of chemical abundance is a universal feature of cells where such a network optimizes the efficiency and faithfulness of self-reproduction. These findings provide novel insights into the nature of the organization of reaction dynamics in living cells.Comment: revtex, 11 pages, 3 figures, submitted to Phys. Rev. Let

arXiv.org e-Print Archive

Crossref

CERN Document Server

Network properties of written human language

Author: A. P. Masucci
G. J. Rodgers
G. K. Zipf
G. Orwell
H. A. Simon
S. N. Dorogovtsev
W. Li
Publication venue: 'American Physical Society (APS)'
Publication date: 08/05/2006
Field of study

We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures

arXiv.org e-Print Archive

Crossref

UCL Discovery

CERN Document Server

Brunel University Research Archive

Universal scaling in sports ranking

Author: Alain Bulou
Aoyama H
Baek S K
Baek S K
Ben-Naim E
Bradley R A
Minnhagen P
Pareto V
Pareto V
Qiuping A Wang
Toral R
Wei Li
Weibing Deng
Xu Cai
Zipf G K
Zipf G K
Publication venue: 'IOP Publishing'
Publication date: 12/11/2011
Field of study

Ranking is a ubiquitous phenomenon in the human society. By clicking the web pages of Forbes, you may find all kinds of rankings, such as world's most powerful people, world's richest people, top-paid tennis stars, and so on and so forth. Herewith, we study a specific kind, sports ranking systems in which players' scores and prize money are calculated based on their performances in attending various tournaments. A typical example is tennis. It is found that the distributions of both scores and prize money follow universal power laws, with exponents nearly identical for most sports fields. In order to understand the origin of this universal scaling we focus on the tennis ranking systems. By checking the data we find that, for any pair of players, the probability that the higher-ranked player will top the lower-ranked opponent is proportional to the rank difference between the pair. Such a dependence can be well fitted to a sigmoidal function. By using this feature, we propose a simple toy model which can simulate the competition of players in different tournaments. The simulations yield results consistent with the empirical findings. Extensive studies indicate the model is robust with respect to the modifications of the minor parts.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Bidding process in online auctions and winning strategy:rate equation approach

Author: B. Kahng
G. K. Zipf
H. A. Simon
I. Yang
J. P. Bouchard
R. N. Mantegna
V. Pareto
Publication venue: 'American Physical Society (APS)'
Publication date: 08/11/2005
Field of study

Online auctions have expanded rapidly over the last decade and have become a fascinating new type of business or commercial transaction in this digital era. Here we introduce a master equation for the bidding process that takes place in online auctions. We find that the number of distinct bidders who bid

k

times, called the

k

-frequent bidder, up to the

t

-th bidding progresses as

n_k(t)\sim tk^{-2.4}

. The successfully transmitted bidding rate by the

k

-frequent bidder is obtained as

q_k(t) \sim k^{-1.4}

, independent of

t

for large

t

. This theoretical prediction is in agreement with empirical data. These results imply that bidding at the last moment is a rational and effective strategy to win in an eBay auction.Comment: 4 pages, 6 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Complex network analysis of literary and scientific texts

Author: ANDRZEJ KULIG
Barabási A.-L.
Christiansen M. H.
IWONA GRABSKA-GRADZIŃSKA
JAROSŁAW KWAPIEŃ
STANISŁAW DROŻDŻ
Zipf G. K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 21/05/2012
Field of study

We present results from our quantitative study of statistical and network properties of literary and scientific texts written in two languages: English and Polish. We show that Polish texts are described by the Zipf law with the scaling exponent smaller than the one for the English language. We also show that the scientific texts are typically characterized by the rank-frequency plots with relatively short range of power-law behavior as compared to the literary texts. We then transform the texts into their word-adjacency network representations and find another difference between the languages. For the majority of the literary texts in both languages, the corresponding networks revealed the scale-free structure, while this was not always the case for the scientific texts. However, all the network representations of texts were hierarchical. We do not observe any qualitative and quantitative difference between the languages. However, if we look at other network statistics like the clustering coefficient and the average shortest path length, the English texts occur to possess more clustered structure than do the Polish ones. This result was attributed to differences in grammar of both languages, which was also indicated in the Zipf plots. All the texts, however, show network structure that differs from any of the Watts-Strogatz, the Barabasi-Albert, and the Erdos-Renyi architectures

arXiv.org e-Print Archive

Crossref

Highlighting Current Trends in Volunteered Geographic Information

Author: Antonio V.
Jonietz D.
See L.
Zipf A.
Publication venue: MDPI Publishing ; International Society for Photogrammetry and Remote Sensing (ISPRS)
Publication date: 01/07/2017
Field of study

Volunteered Geographic Information (VGI) is a growing area of research. This Special Issue aims to capture the main trends in VGI research based on 16 original papers, and distinguishes between two main areas, i.e., those that deal with the characteristics of VGI and those focused on applications of VGI. The topic of quality assessment and assurance dominates the papers on VGI characteristics, whereas application-oriented work covers three main domains: human behavioral analysis, natural disasters, and land cover/land use mapping. In this Special Issue, therefore, both the challenges and the potentials of VGI are addressed

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

International Institute for Applied Systems Analysis (IIASA)